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1.  STATEMENT  OF  THE  PROBLEM  STUDIED 


Our  main  objective  in  this  MURI  project  has  been  to  investigate  foundational  and 
experimental  techniques  for  enabling  real-time,  fault-tolerant  network  protocols. 

Our  overall  research  goal  has  been  to  study  networking  architectures,  services,  and 
algorithms  which  require  innovative  quality-of-service  and  fault-tolerance  mechanisms. 
We  have  focused  on  multimedia  delivery  in  traditional  client-server  architectures,  both  in 
the  case  of  the  Internet  and  wireless  networks,  as  well  as  on  peer-to-peer  content  delivery 
and  on  mobile  ad-hoc  networks. 

The  unique  composition  of  the  team  has  brought  new  synergies  to  the  problem  domain 
which  permit  the  complete  illumination  of  each  newly  proposed  protocol  from  all  angles, 
from  mathematical  modeling  and  analysis  to  experimental  evaluation,  from  real-time  and 
QoS  aspects  to  fault-tolerance  and  reliability  aspects.  Our  approach  is  to  improve  newly 
designed  protocols  through  feedback  from  timing  and  fault  analysis,  and  to  develop  new 
analysis  techniques  driven  by  new  protocol  designs. 

2.  SUMMARY  OF  THE  MOST  IMPORTANT  RESULTS 

2.1  Results  in  Multimedia  Communication 

2.1.1  Receiver  Driven  Bandwidth  Sharing  for  TCP  with  Applications  to  Video 
Streaming 

Applications  using  Transmission  Control  Protocol  (TCP),  such  as  web-browsers,  ftp,  and 
various  peer-to-peer  (P2P)  programs,  dominate  most  of  the  Internet  traffic  today.  In  many 
cases,  users  have  bandwidth-limited  last  mile  connections  to  the  Internet  which  act  as 
network  bottlenecks.  Users  generally  run  multiple  concurrent  networking  applications 
that  compete  for  the  scarce  bandwidth  resource.  Standard  TCP  shares  bottleneck  link 
capacity  according  to  connection  round-trip  time  (RTT),  and  consequently  may  result  in  a 
bandwidth  partition  which  does  not  necessarily  coincide  with  the  user’s  desires.  In  this 
work,  we  developed  a  receiver-based  bandwidth  sharing  system  (BWSS)  for  allocating 
the  capacity  of  last-hop  access  links  according  to  user  preference.  Our  system  does  not 
require  modifications  to  the  TCP  protocol,  network  infrastructure  or  sending  hosts, 
making  it  easy  to  deploy.  By  breaking  fairness  between  Hows  on  the  access  link,  the 
BWSS  can  limit  the  throughput  fluctuations  of  high-priority  applications.  We  utilize  the 
BWSS  to  perform  efficient  video  streaming  over  TCP  to  receivers  with  bandwidth- 
limited  last  mile  connections.  We  have  demonstrated  the  effectiveness  of  our  proposed 
system  through  Internet  experiments  [MeVIZaOS,  NgMeZa03,  MeZa03). 

2.1.2  Multiple  Sender  Distributed  Video  Streaming 

With  the  explosive  growth  of  video  applications  over  the  Internet,  many  approaches  have 
been  proposed  to  stream  video  effectively  over  packet  switched,  best-effort  networks.  In 
this  work,  we  developed  a  receiver-driven  protocol  for  simultaneous  video  streaming 
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from  multiple  senders  to  a  single  receiver  in  order  to  achieve  higher  throughput,  and  to 
increase  tolerance  to  packet  loss  and  delay  due  to  network  congestion.  Our  receiver- 
driven  protocol  employs  a  novel  rate  allocation  algorithm  (RAA)  and  a  packet  partition 
algorithm  (PPA).  The  RAA,  run  at  the  receiver,  determines  the  sending  rate  for  each 
sender  by  taking  into  account  available  network  bandwidth,  channel  characteristics,  and  a 
prespecified,  fixed  level  of  forward  error  correction,  in  such  a  way  as  to  minimize  the 
probability  of  packet  loss.  The  PPA,  run  at  the  senders  based  on  a  set  of  parameters 
estimated  by  the  receiver,  ensures  that  every  packet  is  sent  by  one  and  only  one  sender, 
and  at  the  same  time,  minimizes  the  startup  delay.  Using  both  simulations  and  Internet 
experiments,  we  demonstrate  the  effectiveness  of  our  protocol  in  reducing  packet  loss 
[NgZa04,NgZa03a,NgZa02a,NgZa02b,NgZa02c]. 

2.1.3  Path  Diversity  for  Unicast  Video  Streaming 

Packet  loss  and  end-to-end  delay  limit  delay  sensitive  applications  over  the  best  effort 
packet  switched  networks  such  as  the  Internet.  In  our  previous  work  on  distributed  video 
streaming,  we  have  shown  that  substantial  reduction  in  packet  loss  can  be  achieved  by 
sending  packets  at  appropriate  sending  rates  to  a  receiver  from  multiple  senders,  using 
disjoint  paths,  and  by  protecting  packets  with  forward  error  correction.  In  this  project,  we 
have  developed  a  Path  Diversity  with  Forward  error  correction  (PDF)  system  for  delay 
sensitive  applications  over  the  Internet  in  which,  disjoint  paths  from  a  sender  to  a  receiver 
are  created  using  a  collection  of  relay  nodes.  We  have  developed  a  scalable,  heuristic 
scheme  for  selecting  a  redundant  path  between  a  sender  and  a  receiver,  and  show  that 
substantial  reduction  in  packet  loss  can  be  achieved  by  dividing  packets  between  the 
default  path  and  the  redundant  path.  NS  simulations  have  used  to  verify  the  effectiveness 
of  PDF  system  [NgZa03b], 

2.1.4  Path  Diversity  for  Overlay  Multicast  Streaming 

In  this  project,  we  developed  a  new  path-diversity  based  scheme  for  application  layer 
multicast  streaming  over  the  Internet.  Rather  than  building  simple  trees  as  in  traditional 
multicast,  we  construct  multicast  k-DAGs,  characterized  by  the  property  that  each 
receiver  has  k  parents.  This  multiplicity  of  parents  not  only  allows  for  streaming  from 
multiple  sources  at  the  same  time,  thereby  de-correlating  losses,  but  also  creates  an 
opportunity  to  dynamically  adapt  streaming  rates  from  these  senders  depending  on  the 
existing  error  conditions  in  the  network.  To  exploit  these  possibilities,  we  use  a  simple 
rate  allocation  algorithm  and  a  packet-partitioning  algorithm  that  allows  a  receiver  to  co¬ 
ordinate  the  sending  of  data  from  amongst  its  parents.  Our  results  show  that  our  scheme 
is  effective  in  dealing  with  packet  losses  in  the  network,  and  increases  the  good-put  of 
FEC-coded  video  data  by  15-30%  [BaZa04a,BaZa04b]. 

2.1.5  Peer  to  Peer  Systems 

In  the  area  of  peer  to  peer  systems,  our  efforts  have  been  focused  on  the  following; 
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•  We  developed  a  novel  architecture  and  set  of  protocols  (CITADEL)  to  provide 
content  protection  and  propagation  control  in  decentralized  peer-to-peer  systems. 
Our  work  also  shows  how  such  protection  can  improve  the  usability  and  scope  of 
peer-to-peer  systems. 

•  We  considered  the  question  of  providing  incentives  to  participating  peer  in  such 
systems.  We  designed  and  evaluated  1)  a  reputation  tracking  system  for  peer-to- 
peer  systems,  2)  a  technique  to  provide  service  differentiation  using  reputations  as 
may  be  provided  through  our  reputation  system  and  3)  a  robust  pricing  system 
which  can  be  used  to  provide  financial  incentives  for  participation. 

•  Developed  a  novel  file-centric  model  for  the  evaluation  of  the  performance  of 
peer-to-peer  file  sharing  systems.  The  model  allows  for  the  evaluation  of  large- 
scale  systems.  The  model  is  flexible  and  extensible  and  our  work  demonstrates  its 
use  in  understanding  the  effect  of  various  modes  of  peer  behaviors. 


2.1.6  Other  Aspects  of  Multimedia  Streaming 

Other  results  in  the  general  area  of  multimedia  streaming  include: 

•  Developed  a  novel  technique  for  quality  smoothing  for  layered  video  streaming 
systems  and  demonstrated  its  effectiveness  in  realistic  video  streaming 
experiments. 

•  Analyzed  the  feasibility  of  streaming  video  over  TCP  connections  and  provided 
analytic  evaluation  of  buffer  requirements  based  on  newly  developed  TCP 
throughput  formulas.  Provided  a  systematic  comparison  between  TCP  and  TFRC 
streaming  approaches. 

•  Designed  and  evaluated  a  novel  video  streaming  system  for  handling  flash  crowd 
receiving  video  from  a  server.  The  system  is  distinguished  by  its  use  of  a  simple 
single-description  encoded  video  and  it  ability  to  trade-off  minor  video  pauses  for 
complexity  and  overhead. 

•  We  developed  a  comparison  methodology  for  layered  and  replicated  stream  video 
multicasting.  Based  on  this  methodology,  we  conducted  a  systematic  comparison 
of  the  multicasting  schemes.  We  found  that  the  believed  superiority  of  layered 
multicasting  is  not  as  clear  cut  as  is  widely  believed. 

2.1.7  Scheduling  for  Wireless  Streaming 

We  have  developed  a  class  of  rate-distortion  optimized  packet  scheduling  algorithms  for 
streaming  media  by  generating  a  number  of  nested  substreams,  with  more  important 
streams  embedding  less  important  ones  in  a  progressive  manner.  Our  goal  is  to  determine 
the  optimum  substream  to  send  at  any  moment  in  time,  using  feedback  information  from 
the  receiver  and  statistical  characteristics  of  the  video.  To  do  so,  we  model  the  streaming 
system  as  a  queueing  system,  compute  the  run-time  decoding  failure  probability  of  a 
group  of  picture  in  each  substream  based  on  effective  bandwidth  approach,  and  determine 
the  optimum  substream  to  be  sent  at  that  moment  in  time.  We  evaluate  our  scheduling 
scheme  with  various  video  traffic  models  featuring  short-range  dependency  (SRD),  long- 


3 


range  dependency  (LRD),  and/or  multifractal  properties.  From  experiments  with  real 
video  data,  we  show  that  our  proposed  scheduling  scheme  outperforms  the  conventional 
sequential  sending  scheme  [K.aZa05,KaZa03]. 

2.1.8  Multiple  Description  Video  Coding  for  Wireless  Video 

Multiple  description  coding  (MDC)  is  an  error  resilient  source  coding  scheme  that  creates 
multiple  bitstreams  of  approximately  equal  importance.  The  reconstructed  signal  based 
on  any  single  bitstream  has  an  acceptable  quality.  However,  a  higher  quality 
reconstruction  can  be  achieved  with  larger  number  of  bitstreams.  We  have  developed  a 
multiple  (2)  description  video  coding  scheme  based  on  the  3  loop  structure  originally 
proposed  in  [1].  We  modify  the  discrete  cosine  transform  structure  to  the  matching 
pursuits  framework  and  evaluate  performance  gain  using  maximum  likelihood  (ML) 
enhancement  when  both  descriptions  are  available.  We  find  that  ML  enhancement  works 
best  for  low  motion  sequences  and  results  in  gains  of  up  to  1 .3  dB  in  terms  of  average 
PSNR.  Rate  distortion  performance  is  characterized.  Performance  comparison  is  made 
between  our  MDC  scheme  and  single  description  coding  (SDC)  schemes  over  lossy 
channels,  including  two  state  Markov  channels  and  Rayleigh  fading  channels.  We  find 
that  MDC  outperforms  SDC  in  bursty  slowly  varying  environments.  In  the  case  of 
Rayleigh  fading  channels,  interleaving  helps  SDC  close  the  gap  and  even  outperform 
MDC  depending  on  the  amount  of  interleaving  performed,  at  the  expense  of  additional 
delay  [TaZa02,TaZa01]. 


2.2  Results  in  Wireless  Protocols 

2.2.1  Flow  Control  in  Wireless  Networks 

Rate  control  is  an  important  issue  in  video  streaming  applications  for  both  wired  and 
wireless  networks.  A  widely  accepted  rate  control  method  in  wired  networks  is  equation 
based  rate  control,  in  which  the  TCP  Friendly  rate  is  determined  as  a  function  of  packet 
loss  rate,  round  trip  time  and  packet  size.  This  approach,  also  known  as  TFRC,  assumes 
that  packet  loss  in  wired  networks  is  primarily  due  to  congestion,  and  as  such  is  not 
applicable  to  wireless  networks  in  which  the  bulk  of  packet  loss  is  due  to  error  at  the 
physical  layer.  In  this  work,  we  have  developed  multiple  TFRC  connections  as  an  end-to- 
end  rate  control  solution  for  wireless  video  streaming,  We  show  that  this  approach  not 
only  avoids  modifications  to  the  network  infrastructure  or  network  protocol,  but  also 
results  in  full  utilization  of  the  wireless  channel.  NS-2  simulations,  actual  experiments 
over  lxRTT  CDMA  wireless  data  network,  and  video  streaming  simulations  using  traces 
from  the  actual  experiments,  are  carried  out  to  validate,  and  characterize  the  performance 
of  our  proposed  approach  [ChZa06a,ChZa06b,ChZa05a,ChZa05b,ChZa04a,ChZa04b]. 

2.2.2  Adaptive  Packet  Scheduling  in  Wireless  Networks] 

To  handle  short-term  channel  variations,  adaptive  packet  scheduling  is  an  attractive 
course  to  take.  The  packet  scheduling  should  provide  both  (1)  timely  delivery  of  real-time 
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(RT)  packets  and  (2)  error-free  delivery  of  non-RT  packets  via  fair  usage  of  wireless  link 
bandwidth  under  a  dynamically-fluctuating  channel  condition.  Packet  transmissions  are 
adaptively  scheduled,  depending  on  the  predicted  channel  condition.  Wireless  channels 
are  known  to  be  time-varying  and  suffer  from  location-dependent  and  bursty  errors.  Since 
the  same  channel  can  simultaneously  be  seen  by  one  mobile  as  bad  and  by  another  as 
good,  this  type  of  adaptive  packet  scheduling  can  be  effective  if  reliable  channel 
condition  prediction  is  available. 

The  condition  of  a  channel  can  be  predicted  based  on  (1)  the  information  obtained  from 
the  physical  layer;  and/or  (2)  a  hand-shaking  mechanism  before  each  packet  transmission. 
In  the  second  approach,  before  a  scheduled  packet  transmission,  the  sender  transmits  a 
control  packet  to  the  receiver,  then  the  receiver  in  turn  replies  with  another  control 
packet.  Only  if  the  sender  receives  a  correct  reply,  the  channel  between  them  is  predicted 
to  be  good,  so  the  sender  will  send  the  original  packet.  This  type  of  prediction  is  effective 
since  most  channel  errors  are  bursty  and  short-lived. 

If  the  channel  condition  is  predicted  to  be  bad,  the  original  packet  transmission  can  be 
deferred,  and  the  packet  which  is  scheduled  next  can  be  transmitted  instead  via  a  different 
channel.  This  deferment  and  rescheduling  are  done  to  achieve  two  different  goals  in 
transmitting  RT  and  non-RT  traffic,  and  have  considered  their  respective  QoS 
requirements.  The  basic  problem  here  is  how  to  schedule  packet  transmissions  with  a 
limited  and  time-varying  bandwidth  assigned  to  each  mobile. 

2.2.3  Adaptive  Error  Control  in  Wireless  Networks 

Longer  time-scale  channel  variations  cannot  be  handled  by  adaptive  packet  scheduling. 
For  example,  as  a  receiver  gets  farther  from  the  sender,  the  channel  between  them  will 
get  worse  on  average,  so  they  cannot  maintain  the  communication  quality,  e.g.,  in  terms 
of  packet  error  probability.  Power  and  rate  control  can  be  an  attractive  candidate  to 
handle  this  type  of  channel  variation  effectively.  In  particular,  achieving  effective  power 
control  so  as  to  meet  end-to-end  QoS  requirements  (especially  in  multi-hop  ad  hoc 
networks)  in  a  distributed  environment  is  an  interesting  and  challenging  problem  we 
investigated,  and  extends  the  state-of-the-art  of  current  distributed  power  control 
algorithms  which  aim  to  meet  specified  link  qualities  for  voice  and  circuit-mode  traffic  in 
cellular  and  WLAN  systems.  We  can  use  adaptive  error  control  instead.  By  adapting  the 
redundancy  level  used  for  error  correction  depending  on  the  channel  condition,  we  were 
able  to  maintain  the  same  error  performance  level  even  under  the  time-varying  channel 
condition. 

Non-RT  packets  can  rely  on  retransmissions  while  retransmission  can  be  used  in  a 
limited  manner  for  RT  packets  due  to  their  timeliness  requirement.  Adaptive  error  control 
should  (1)  provide  the  required  error  performance  to  RT  packets  and  (2)  maximize 
throughput  performance  of  non-RT  packets  while  achieving  error-free  communications 
via  retransmissions.  The  basic  question  here  is  what  portion  of  the  available  bandwidth 
should  be  assigned  for  error-control  redundancy, 
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2.2.4  Adaptive  Bandwidth  Management  in  Wireless  Networks 

Due  to  both  user  mobility  and  channel  condition  fluctuation,  the  available  link  bandwidth 
varies  over  time.  Considering  adaptive  applications,  which  can  change  the  user-perceived 
communication  quality,  or  reward,  depending  on  the  assigned  bandwidth,  how  much 
bandwidth  to  assign  to  each  connection  or  application  in  the  environment  of  fluctuating 
available  bandwidth  is  an  important  problem.  This  problem  can  be  geared  toward 
maximizing  the  aggregate  user-perceived  quality  or  utility  or  reward. 

To  handle  this,  each  connection  or  application  is  characterized  by  a  utility  curve,  which 
specifies  the  utility  level  as  a  function  of  the  given  bandwidth.  Each  connection  can  also 
specify  its  own  adaptation  constraints,  i.e.,  in  terms  of  how  often  to  adapt,  and  how  much 
to  adapt.  For  example,  upgrading  and  downgrading  a  connection's  bandwidth  too  often 
and  too  drastically  will  not  be  desirable  if  that  connection  is  for  a  video  communication. 

2.2.5  Interplay  Between  Adaptive  Error  Control,  Bandwidth  Management  and 
Packet  Scheduling 

All  of  the  above  three  adaptive  schemes  should  work  synergistically  to  achieve  the 
system  goals.  The  bandwidth  management  scheme  should  work  in  the  highest  layer  by 
determining  how  much  bandwidth  should  be  allocated  to  each  connection  based  on  the 
information  like  the  channel  condition.  This  information  is  partly  coming  from  both 
adaptive  error  control  and  packet  scheduling. 

Adaptive  error  control  determines  how  strong  error  control  should  be,  thus  determining 
how  much  bandwidth  is  utilized  for  actual  information  transmission  out  of  the  total 
assigned  bandwidth  to  a  connection.  Adaptive  packet  scheduling  works  in  the  lowest 
layer  by  scheduling  actual  packet  transmissions  based  on  the  available  actual  bandwidth 
for  each  connection. 

2.2.6  Handoff  in  Wireless  Networks 

For  the  cellular  wireless  networks,  we  have  developed  new  handoff  schemes  for  CDMA 
systems.  Our  proposed  handoff  schemes  can  significantly  decrease  both  the  number  of 
dropped  handoff  calls  and  the  number  of  blocked  calls  without  degrading  the  quality  of 
communication  service  and  the  soft  handoff  process.  Two  patents  on  these  topics  have 
been  granted. 

We  have  also  studied  the  performance,  availability  and  performability  of  various  handoff 
techniques  for  different  arrival  models  and  toad  conditions.  In  general,  wireless  systems 
are  characterized  by  their  scarce  radio  sources  which  limit  not  only  the  service  offering 
but  also  the  QoS.  Furthermore,  service  degradation  can  be  caused  by  component  failures, 
software  failures  and  human  errors  in  operation  in  the  wireless  system.  Compared  with 
wired  networks,  wireless  networks  need  to  deal  with  disconnects  due  to  handoff,  noise 
and  interference,  fast  (slow)  fading,  blocked  and  weak  signals  and  run-down  batteries.  In 
addition,  the  performance  and  availabi  lity  of  a  wireless  system  is  affected  by  the  outage- 
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and-recovery  of  its  supporting  functional  units.  From  the  designer  and  operator’s  point  of 
view,  it  is  of  great  importance  to  take  these  factors  into  account  integratively.  For 
wireless  cellular  networks,  we  have  developed  two  level  hierarchical  models  with 
handoff  and  channel  failures.  Further,  for  a  TDMA  system  consisting  of  a  base  repeaters 
and  a  control  channel,  we  have  build  a  hierarchical  Markov  chain  model  for  automatic 
protection  switching. 

2.2.7  QoS  in  IEEE  802.1  le  Supplements 

In  the  area  of  IEEE  802, 1 1  wireless  LANs,  we  have  tackled  a  variety  of  problems  dealing 
with  scheme  for  optimal  training  of  wireless  channels,  reliability  and  dependability 
studies  of  IEEE  802.11  protocols.  In  particular,  we  have  evaluated  the  capability  of  the 
enhanced  point  coordination  function  (EPCF)  for  QoS  in  the  IEEE  802.1  le  supplements, 
to  support  VoIP  applications.  Our  results  show  that  in  the  scenario  where  VoIP  calls  are 
made  between  wireline  and  wireless  networks,  the  EPCF  operation  mode  provides  low 
end  to  end  delays  for  voice  calls  and  its  performance  is  not  sensitive  to  background  best 
effort  traffic. 

2.2.8  Shadow  Regions  for  IEEE  802.1 1  b/g 

The  presence  of  physical  obstacles  and  radio  interference  results  in  the  so  called  “shadow 
regions”  in  wireless  networks.  When  a  mobile  station  roams  into  a  shadow  region,  it  loses 
its  network  connectivity.  In  cellular  networks,  in  order  to  minimize  the  connection 
unreliability,  careful  cel!  planning  is  required  to  prevent  the  occurrence  of  the  shadow 
regions  in  the  first  place.  In  802.1  Ib/g  wireless  LANs,  however,  due  to  the  limited 
frequency  spectrum,  it  is  not  always  possible  to  prevent  a  shadow  region  by  adding 
another  cell  at  a  different  frequency.  We  have  proposed  the  alternate  approach  of 
tolerating  the  existence  of  "shadow  regions"  as  opposed  to  prevention  in  order  to 
enhance  the  connection  dependability.  A  redundant  access  point  (AP)  is  placed  in  the 
shadow  region  to  serve  the  mobile  stations  that  roam  into  that  region.  To  evaluate  the 
dependability  of  the  network  under  study,  we  have  presented  the  reliability,  availability 
and  survivability  analysis  of  the  two  configurations  and  compare  them  with  the  scheme 
with  no  redundancy. 


2.3  Results  in  Mobile  Ad  hoc  Networks 

2.3.1  Multi-Path  Unicast  Streaming  in  Wireless  Ad  hoc  Networks 

In  this  project,  we  developed  a  novel  multi-path  selection  framework  tor  streaming  over 
wireless  ad  hoc  networks.  Our  approach  is  to  approximately  estimate  the  concurrent 
packet  drop  probability  of  two  paths  by  taking  into  account  the  interference  between 
different  links,  and  to  select  the  best  path  pair  based  on  that  estimation.  We  prove  the 
optimal  path  selection  problem  to  be  NP-hard,  and  propose  a  heuristic  solution,  whose 
performance  is  shown  to  be  close  to  that  of  the  optimal  solution,  while  significantly 
outperforming  other  heuristic  protocols  [WeZa7a,WeZa6b,WeZa4a,WeZa4b], 
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2.3.2  Multiple  Tree  Video  Multicast  in  Wireless  Ad  hoc  Networks 

In  this  project,  we  developed  multiple  tree  construction  schemes  and  routing  protocols  for 
video  streaming  over  wireless  ad  hoc  networks.  The  basic  idea  is  to  split  the  video  into 
multiple  parts  and  send  each  part  over  a  different  tree,  which  are  constructed  to  be 
disjoint  with  each  other  so  as  to  increase  robustness  to  loss  and  other  transmission 
degradations.  Specifically,  we  propose  two  novel  multiple  tree  multicast  protocols.  Our 
first  scheme  constructs  two  disjoint  multicast  trees  in  a  serial,  but  distributed  fashion,  and 
is  referred  to  as  Serial  MDTMR.  It  achieves  reasonable  tree  connectivity  while 
maintaining  disjointness  of  two  trees.  In  order  to  reduce  routing  overhead  and 
construction  delay,  we  further  propose  parallel  multiple  nearly-disjoint  multicast  trees 
protocol,  which  is  also  shown  to  achieve  reasonable  tree  connectivity.  Simulations  show 
that  resulting  video  quality  for  either  scheme  is  significantly  higher  than  that  of  single 
tree  multicast,  with  similar  routing  overhead  and  forwarding  efficiency 
[WeZa07a,WeZa07b,  WeZa6a,WeZa4a,WeZa4c]. 

2.3.3  Highly  Partitioned  and  Sparse  Mobile  Ad  hoc  Networks 

In  the  area  of  highly  partitioned  and  sparse  mobile  ad  hoc  networks,  we  have  developed 
the  following  results: 

•  We  conceived  of  a  new  paradigm  for  message  delivery  utilizing  predictable,  non- 
random  movement  of  some  nodes  called  "Message  Ferries".  This  defines  a  novel 
store-carry-and-forward  delivery  paradigm  that  shows  considerable  promise  in 
such  partitioned  networks  that  arise  in  battle-filed  and  disatser  relief 
environments. 

•  We  developed  and  evaluated  a  Ferry  routing  algorithm  suitable  for  partitioned 
networks  with  mobile  nodes. 

•  We  have  developed  and  evaluated  two  approaches  for  ferry  routing  to  deal  with 
mobile  nodes.  One  approach  uses  fixed  ferry  movement  with  proactive  node 
movement,  the  other  uses  proactive  ferry  movements. 


2.4  Results  in  Fault  Tolerant  Communication  Algorithms 

We  have  focused  on  the  development  of  powerful  fault-  tolerant  communication 
algorithms,  the  analysis  of  such  algorithms,  and  the  development  of  semantic  foundations 
and  tools  to  support  the  algorithms  and  analysis  work.  We  made  a  great  deal  of  progress 
in  all  three  directions,  as  documented  in  the  list  of  publications  presented  below.  Here 
are  some  highlights. 

2.4.1  Algorithms 

We  developed  the  Rambo  algorithms  for  implementing  atomic  memory  in  highly 
dynamic  networks  [RAMBOI.  RAMBOII,  Gilbert-MS,  RAMBOIl-tr], 
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We  developed  the  Virtual  Node  approach  to  programming  applications  in  mobile  ad  hoc 
networks  [DGLSW,  DGLSW-jour,  BGLNNS,  DGLLN-opodis05,  LMN-tr,  LMN, 
DGLLN-allerton,  DLLN,  DGLLN-podc05] 

Basically,  this  approach  defined  a  new  programming  abstraction  for  mobile  networks:  a 
virtual  network  consisting  of  mobile  client  nodes  and  simple  virtual  nodes.  A  virtual 
node  may  be  an  atomic  object,  or  a  timed  or  untimed  I/O  automaton,  and  may  reside  at  a 
fixed  geographical  location  or  may  move  according  to  a  predictable  path.  Our  work 
defines  several  kinds  of  virtual  node  layers  and  shows  how  to  implement  them  in  mobile 
networks.  It  also  shows  how  the  abstraction  can  be  used  to  implement  various 
applications,  including  highly  fault-tolerant  atomic  read/write  memory,  geographical  and 
point-to-point  message  routing,  location  services,  intruder  tracking,  robot  coordination, 
and  vehicle  coordination  (e.g.,  a  virtual  traffic  light). 

We  developed  a  new  strategy  for  fault-tolerant  message  flooding  in  (possibly  mobile) 
sensor  networks  [LivadasLynchj.  The  algorithm  tolerates  a  variety  of  failures  and 
network  changes,  and  achieves  quite  low  communication  cost.  The  basic  idea  is  to 
combine  two  kinds  of  communication:  flood  newly-acquired  information  immediately, 
while  monitoring  in  the  background  to  detect  when  neighboring  nodes  should  be  brought 
up  to  date. 

We  defined  the  problem  of  "gradient  clock  synchronization"  for  mobile/sensor  networks 
[FanLynch-gradient,  FL04,  FCL04]  which  says  that  that  clocks  of  nearby  processors 
always  be  closely  synchronized.  We  proved  a  fundamental  result  saying  that  this 
property  is  impossible  to  achieve  under  standard  network  assumptions.  However,  we 
were  able  to  construct  a  practical  clock  synch  algorithm  that  makes  use  of  GPS  when 
available  and  that  satisfies  the  gradient  property  "almost  always". 

We  have  also  developed  many  other  algorithms  for  problems  such  as  tracking  in  sensor 
networks  [DANL-stalk],  distributed  consensus  in  Byzantine  settings  [ACKM-podc04], 
and  clustering  in  sensor  networks  [MR04], 

2.4.2  Semantics  and  Verifications 

We  deveoped  the  Timed  Input/Output  Automata  (TlOA)  mathematical  modeling 
framework  for  analyzing  timed  systems,  such  as  communication  protocols  [KLSV- 
monograph,KLSV-rtss].  In  fact,  we  completed  a  comprehensive  monograph  on  the 
TlOA  model,  formulated  to  be  consistent  with  our  recently-developed  Hybrid  I/O 
Automata  (HIOA)  modeling  framework.  The  monograph  includes  the  basic  theory, 
including  composition,  levels  of  abstraction,  rely-guarantee  reasoning,  safety  vs.  liveness, 
region  constructions,  etc.  Everything  is  illustrated  with  simple  examples.  This  is 
intended  to  be  useful  as  a  general  handbook  about  timed  system  modeling,  for  both 
theoreticians  and  system  developers.  It  includes  the  complete  theory,  as  well  as 
suggestions  for  how  to  use  it  to  model  systems. 
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We  developed  techniques  for  rely-guarantee  reasoning  for  TiOA  [KL04].  We  developed 
techniques  for  stability  analysis  using  HIOA  [MitraLiberzon]. 

We  also  developed  Probabilistic  I/O  Automata  models,  with  emphasis  on 
compositionality  properties.  [CheungLSV-short,  CheungLSV-long,  [LSV-concur].  We 
elucidated  the  problems  with  earlier  definitions  for  composition  and  external  behavior  for 
PIOAs:  those  definitions  allow  the  entity  that  schedules  different  components  so  much 
power  that  the  entire  internal  branching  structure  of  the  PIOAs  is  exposed.  Our  new 
solution  to  this  problem  is  to  restrict  the  scheduler's  power  so  that  its  choices  can  depend 
only  on  external ly-vistble  behavior  of  the  components.  As  an  important  first  step,  we 
considered  a  restricted  form  of  PIOA,  "switched  automata",  which  explicitly  control  their 
own  scheduling,  passing  control  from  one  to  the  other  via  special  control  actions.  We 
proved  powerful  compositionality  results  for  this  restricted  model.  More  recently,  we 
have  extended  this  work  to  the  more  general  "task-PIOA"  model,  which  appears  to  be 
suitable  for  modeling  security  protocols. 

2.4,3  Tools 

We  completed  our  work  on  code  generation  from  I/O  Automata  (IOA)  Programs 
[Tauber, TauberLT,GLTV,TauberGarland,VTTL].  We  can  now  generate  runnable  code 
(Java  interacting  with  MPI)  for  our  LAN,  automatically  and  directly  from  IOA  models 
for  distributed  algorithms.  The  algorithm  models  can  be  proved  correct  and  analyzed  for 
performance  using  a  range  of  techniques,  including  hand  analysis,  interactive  theorem¬ 
proving,  and  model-checking.  Thus,  we  essentially  have  a  method  of  generating  verified 
code. 

We  then  extended  the  IOA  language  to  TIOA,  which  has  features  to  model  time-passage, 
using  algebraic  and  differential  equations  and  inequalities  to  describe  "trajectories"  of 
state  evolution  over  time.  We  have  developed  versions  of  various  tools  to  analyze  TIOA 
programs:  a  simulator,  a  translator  to  the  PVS  theorem-prover,  and  a  translator  to  the 
UPPAAL  model-checker.  We  are  currently  engineering  these  tools,  under  the  auspices  of 
an  AFOSR  STTR,  for  wider  use. 


2.5  Results  in  Design  and  Implementation  of  Real  Time  Software 
2.5.1  Meeting  real  time  requirements  in  software 

Papers  related  to  this  topic  include  [J1,J2,C1  ,C2,C3,C4,C5,C6,C7,C8],  Below  is  the 
summary  of  main  results: 

Giotto:  We  developed  Giotto,  a  platform-independent  language  for  specifying  software 
for  high-performance  control  applications.  A  Giotto  program  explicitly  specifies  the 
exact  real-time  interaction  of  software  components  with  the  physical  world.  The  Giotto 
compiler  automatically  generates  timing  code  that  ensures  the  specified  behavior  on  a 
given  platform.  We  illustrated  the  Giotto  methodology  by  reimplementing  the  controller 
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for  an  autonomously  flying  model  helicopter  originally  developed  at  ETH  Zurich.  We 
demonstrated  that  Giotto  introduces  a  negligible  overhead,  and  at  the  same  time  increases 
the  reliability  and  reusability  of  the  control  software. 

Schedule  Carrying  Code:  We  introduced  the  paradigm  of  schedule-carrying  code 
(SCC).  A  hard  real-time  program  can  be  executed  on  a  given  platform  only  if  there  exists 
a  feasible  schedule  for  the  real-time  tasks  of  the  program.  Traditionally,  a  scheduler 
determines  the  existence  of  a  feasible  schedule  according  to  some  scheduling  strategy. 
With  SCC,  a  compiler  proves  the  existence  of  a  feasible  schedule  by  generating 
executable  code  that  is  attached  to  the  program  and  represents  its  schedule.  An  SCC 
executable  is  a  real-time  program  that  carries  its  schedule  as  code,  which  is  produced 
once  and  can  be  revalidated  and  executed  with  each  use.  We  evaluated  SCC  both  in 
theory  and  practice.  In  theory,  we  gave  two  scenarios,  of  non-preemptive  and  distributed 
scheduling  for  Giotto  programs,  where  the  generation  of  a  feasible  schedule  is  hard, 
while  the  validation  of  scheduling  instructions  that  are  attached  to  the  programs  is  easy. 
In  practice,  we  implemented  SCC  and  show  that  explicit  scheduling  instructions  can 
reduce  the  scheduling  overhead  up  to  35%  and  can  provide  an  efficient,  flexible,  and 
verifiable  means  for  compiling  Giotto  programs  on  complex  architectures,  such  as  the 
TTA. 

A  Typed  Assembly  Language  for  Real  Time:  We  presented  a  type  system  for  E  code, 
which  is  an  assembly  language  that  manages  the  release,  interaction,  and  termination  of 
real-time  tasks.  E  code  specifies  a  deadline  for  each  task,  and  the  type  system  ensures 
that  the  deadlines  are  path-insensitive.  We  show  that  typed  E  programs  allow,  for  given 
worst-case  execution  times  of  tasks,  a  simple  schedulability  analysis.  Moreover,  the  real¬ 
time  programming  language  Giotto  can  be  compiled  into  typed  E  code.  This  shows  that 
typed  E  code  identifies  an  easily  schedulable  yet  expressive  class  of  real-time  programs. 
We  have  extended  the  Giotto  compiler  to  generate  typed  E  code,  and  enabled  the  run¬ 
time  system  for  E  code  to  perform  a  type  and  schedulability  check  before  executing  the 
code. 

Event-driven  Programming  with  Logical  Exceution  Times:  We  presented  an 
extension  of  Giotto,  called  xGiotto,  for  programming  applications  with  hard  real-time 
constraints.  Like  its  predecessor,  xGiotto  is  based  on  the  LET  (logical  execution  time) 
assumption:  the  programmer  specifies  when  the  outputs  of  a  task  become  available,  and 
the  compiler  checks  if  the  specification  can  be  implemented  on  a  given  platform. 
However,  while  the  predecessor  language  Giotto  was  purely  time-triggered,  xGiotto 
accommodates  also  asynchronous  events,  Indeed,  through  a  mechanism  called  event 
scoping,  events  are  the  main  structuring  principle  of  the  new  language.  The  xGiotto 
compiler  and  run-time  system  implement  event  scoping  through  a  tree-based  event  filter. 
The  compiler  also  checks  programs  for  determinism  (absence  of  race  conditions)  and 
time  safety  (schedulability). 

2.5.2  Modeling  and  analysis  of  the  real  time  behavior  of  the  system 
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This  work  is  described  in  detail  in  [J3,J4,J5,C15,C16,C17,C18,C19,C20].  Below,  we 
provide  a  summary  of  main  results; 

Hybrid  systems:  A  hybrid  system  is  a  dynamical  system  with  both  discrete  and 
continuous  state  changes.  For  analysis  purposes,  it  is  often  useful  to  abstract  a  hybrid 
system  in  a  way  that  preserves  the  properties  being  analyzed  while  hiding  the  details  that 
are  of  no  interest.  We  showed  that  interesting  classes  of  hybrid  systems  can  be  abstracted 
to  purely  discrete  systems  while  preserving  ail  properties  that  are  definable  in  temporal 
logic.  In  this  way,  we  could  solve  verification  and  control  problems  on  hybrid  systems. 

Real-time  control:  We  argued  that  models  where  a  controller  can  cause  an  action  at  any 
point  in  dense  (rational  or  real)  time  are  problematic,  by  presenting  an  example  where  the 
controller  must  act  faster  and  faster,  yet  causes  no  Zeno  effects  (say,  the  control  actions 
are  at  times  0,  0.5,  1,  1.25,  2,  2.125,  3,  3.0625,  ...).  Such  a  controller  is,  of  course,  not 
implementable  in  software.  Such  controllers  are  avoided  by  formulations  where  the 
controller  can  cause  actions  only  at  discrete  (integer)  points  in  time.  While  the  resulting 
control  problem  is  well-understood  if  the  time  unit,  or  "sampling  rate"  of  the  controller,  is 
fixed  a  priori,  we  defined  a  novel,  stronger  formulation:  the  discrete-time  control  problem 
with  unknown  sampling  rate  asks  if  a  sampling  controller  exists  for  some  sampling  rate. 
We  proved  that  this  problem  is  undecidable. 

Massacio:  Based  on  hybrid  systems,  we  developed  Masaccio,  a  formal  model  for  real¬ 
time  components  which  are  built  from  atomic  discrete  components  (difference 
equations)  and  atomic  continuous  components  (differential  equations)  by  parallel  and 
serial  composition,  arbitrarily  nested.  Each  system  component  consists  of  an  interface, 
which  determines  the  possible  ways  of  using  the  component,  and  a  set  of  executions, 
which  define  the  possible  behaviors  of  the  component  in  real  time. 

Assum e-guarantee  reasoning  for  real  time:  The  assume-guarantee  paradigm  is  a 
powerful  divide-and-conquer  mechanism  for  decomposing  a  verification  task  about  a 
system  into  subtasks  about  the  individual  components  of  the  system.  The  key  to  assume- 
guarantee  reasoning  is  to  consider  each  component  not  in  isolation,  but  in  conjunction 
with  assumptions  about  the  context  of  the  component.  Assume-guarantee  principles  were 
known  for  purely  concurrent  contexts,  which  constrain  the  input  data  of  a  component,  as 
well  as  for  purely  sequential  contexts,  which  constrain  the  entry  configurations  of  a 
component.  We  developed  an  assume-guarantee  principle  for  mixed  parallel-serial 
contexts,  and  for  mixed  discrete-continuous  processes.  This  is  necessary  for  the 
component-based  design  and  analysis  of  embedded  software  systems  which  interact  with 
real-world  environments.  Using  an  example  of  two  cooperating  robots,  we  showed 
refinement  between  a  high-level  model  which  specifies  continuous  timing  constraints  and 
an  implementation  which  relies  on  discrete  sampling. 

2.5,3  Composition  of  real-time  and  stochastic  components 

This  work  is  described  in  detail  in  [C9,C10,C1 1, Cl 2, Cl 3, Cl 4],  Below  is  the  summary  of 
main  results; 
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Timed  interfaces:  We  developed  a  theory  of  timed  interfaces,  which  is  capable  of 
specifying  both  the  timing  of  the  inputs  a  component  expects  from  the  environment,  and 
the  timing  of  the  outputs  it  can  produce.  Two  timed  interfaces  are  compatible  if  there  is  a 
way  to  use  them  together  such  that  their  timing  expectations  are  met.  The  theory 
provides  algorithms  for  checking  the  compatibility  between  two  interfaces  and  for 
deriving  the  composite  interface;  the  theory  can  thus  be  viewed  as  a  type  system  for  real¬ 
time  interaction.  Technically,  a  timed  interface  is  encoded  as  a  timed  game  between  two 
players,  representing  the  inputs  and  outputs  of  the  component.  The  algorithms  for 
compatibility  checking  and  interface  composition  are  thus  derived  from  algorithms  for 
solving  timed  games. 

Timed  games:  Timed  games  are  two-person  games  played  in  real  time,  in  which  the 
players  decide  both  which  action  to  play,  and  when  to  play  it.  Timed  games  differ  from 
untimed  games  in  two  essential  ways.  First,  players  can  take  each  other  by  surprise, 
because  actions  are  played  with  delays  that  cannot  be  anticipated  by  the  opponent. 
Second,  a  player  should  not  be  able  to  win  the  game  by  preventing  time  from  diverging. 
We  presented  a  model  of  timed  games  that  preserves  the  element  of  surprise  and  accounts 
for  time  divergence  in  a  way  that  treats  both  players  symmetrically  and  applies  to  all 
omega-regular  winning  conditions.  We  proved  that  the  ability  to  take  each  other  by 
surprise  adds  extra  power  to  the  players.  For  the  case  that  the  games  are  specified  in  the 
style  of  timed  automata,  we  provided  symbolic  algorithms  for  their  solution  with  respect 
to  all  omega-regular  winning  conditions.  We  also  showed  that  for  these  timed  games, 
memory  strategies  are  more  powerful  than  memoryless  strategies  already  in  the  case  of 
reachability  objectives. 

Compositional  models  for  probabilistic  systems:  We  developed  a  compositional  trace- 
based  model  for  probabilistic  systems.  The  behavior  of  a  system  with  probabilistic 
choice  is  a  stochastic  process,  namely,  a  probability  distribution  on  traces,  or  "bundle." 
Consequently,  the  semantics  of  a  system  with  both  nondeterministic  and  probabilistic 
choice  is  a  set  of  bundles.  The  bundles  of  a  composite  system  can  be  obtained  by 
combining  the  bundles  of  the  components  in  a  simple  mathematical  way.  Refinement 
between  systems  is  bundle  containment.  We  achieved  assume-guarantee 
compositionality  for  bundle  semantics  by  introducing  two  scoping  mechanisms.  The  first 
mechanism,  which  is  standard  in  compositional  modeling,  distinguishes  inputs  from 
outputs  and  hidden  state.  The  second  mechanism,  which  arises  in  probabilistic  systems, 
partitions  the  state  into  probabilistically  independent  regions. 


2.6  Results  on  Software  Aging,  Rejuvenation  and  Preventive  Maintenance 

Our  research  on  software  aging  and  software  rejuvenation  focused  on  developing  novel 
methods  and  techniques  to  detect  and  fix  slowly  creeping  problems  in  computer  systems 
before  they  build  up  and  cause  the  system  to  hang  or  crash.  The  software  aging 
phenomenon  has  been  observed  in  personal  computers,  safety -critical  systems  as  well  as 
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high-availability  systems  and  dealing  with  this  is  critical  to  system  availability  and 
performance. 

Software  rejuvenation  is  a  means  of  gracefully  terminating  an  application,  without 
corrupting  or  losing  data,  and  restarting  it  in  a  clean  state.  Our  papers  were  the  first  to 
clearly  document  the  phenomenon  of  software  aging  based  on  real  system  data  and  we 
have  proposed  new  techniques  to  determine  optimal  times  for  software  rejuvenation  to 
obtain  maximum  availability  and  minimum  performance  loss. 

The  work  on  software  aging  and  software  rejuvenation  has  been  well  accepted  both  by 
research  community  as  well  as  by  industry.  Our  papers  in  the  area  of  software 
rejuvenation  have  been  cited  extensively  by  other  researchers  in  leading  international 
conferences  and  journals,  and  we  have  presented  several  well-attended  tutorials  on  this 
topic. 

In  the  area  of  preventive  maintenance,  we  have  worked  on  both  time-based  and 
inspection-based  preventive  maintenance.  Preventive  maintenance  of  operational 
software  systems/hardware  is  used  specifically  to  counteract  degradation  phenomenon 
(increasing  failure  rate  with  time).  However,  preventive  maintenance  incurs  an  overhead 
in  terms  of  downtime  and  cost,  and  these  must  be  traded  off  with  the  cost  of  failures  to 
obtain  maximum  benefits.  We  have  developed  analytical  models  employing  inspection- 
based  preventive  maintenance,  through  continuous  time  Markov  chains,  semi-Markov 
and  Markov  Regenerative  Process  (MRGP)  with  a  subordinated  semi-Markov  reward 
process,  considering  preemptive-resume  type  transitions.  In  the  case  of  Markov  and  semi- 
Markov  models,  we  have  developed  closed-form  solutions  while  in  the  case  of  MRGP, 
we  have  solved  the  models  numerically.  The  MRGP  models  are  solved  for  steady  state  as 
well  as  transient  conditions  and  expressions  for  expected  downtime  and  expected  cost  are 
derived.  Numerical  examples  are  presented  to  illustrate  the  applicability  of  the  models. 
With  the  help  of  these  models,  optimal  strategies  for  preventive  maintenance  techniques 
could  be  formulated. 
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Appendix  I:  Honors,  Awards  and  Press  Releases 

•  Avideh  Zakhor  received  Okawa  Foundation  Prize  2004  on  wireless  video 
communication. 

•  Avideh  Zakhor  was  appointed  Fellow  of  IEEE,  2002. 

•  Best  paper  award:  T.  Nguyen  and  A.  Zakhor,  "Distributed  Video  Streaming  with 
Forward  Error  Correction"  in  Packet  Video  2002,  Pittsburgh,  April  2002. 

•  Minghua  Chen  of  UC  Berkeley  received  the  Eli  Juri  Award  in  2007  for  his  thesis 
which  was  sponsored  by  this  MURI. 

•  Kishor  S.  Trivedi  was  awarded  Fulbright  Fellowship  from  Nov.  2002 -June  2003  and 
was  Poonam  and  Prabhu  Goel  Chair  Professor  at  IIT  Kanpur  during  his  Sabbatical 
from  Duke  University. 

•  Mostafa  Am  mar  was  elected  ACM  FELLOW  in  December  2003. 

•  Paul  Judge  (MURI  Fellow  who  graduated  in  Dec.  2002)  was  named  to  MIT 
Technology  Review's  Magazine  top  100  young  innovators  in  2003. 

•  Kang  G.  Shin  received  the  following  honors  and  awards: 

o  Stephen  Attwood  Award,  College  of  Engineering,  The  University  of  Michigan 
(the  highest  award/honor  in  the  college)  (2004). 
o  Outstanding  Zhukezhen  Lectureship  Award  (2004). 
o  IEEE  RTC  Technical  Achievement  Award  (2003). 

o  Best  IBM  Research  Papers  in  Computer  Science,  Electrical  Engineering  and 
Math  published  in  2002  (2003). 
o  IEEE  IWQoS  Best  Paper  Award 

o  2003  IEEE  Communications  Society  William  R.  Bennett  Prize  Paper  Award, 
o  Distinguished  Alumni  Award  from  College  of  Engineering  of  Seoul  National 
University  (2002). 

•  Nancy  Lynch  received  the  following  honors  and  awards: 

o  Knuth  prize  (2007) 
o  Van  Wijngaarden  prize  (2006) 

o  Technology  Review's  "10  Emerging  Technologies  That  Will  Change  the 
World”  (2003) 

o  Elected  to  National  Academy  of  Engineering  (200 1 ) 
o  Dijkstra  Prize/PODC  Influential  Paper  Award,  for  "Impossibility  of 
Consensus  with  one  Faulty  Process",  with  Fischer  and  Paterson  (2001) 

•  Idit  Keidar  received  the  following  honors  and  awards: 

o  A  Ion  Fellowship  for  Junior  Faculty  (2001) 

o  Awarded  a  Technion  Management  Career  Development  Chair  (200 1 ) 

•  Alex  Shvartsman  received  the  following  honors  and  awards: 

o  Outstanding  Research  Award  for  Junior  Faculty  at  the  University  of 
Connecticut  (2001) 

o  Recepient  of  the  NSF  CAREER  Award  for  2000-2004 

•  Roger  Khazan  was  elected  to  Sigma  Xi,  The  Scientific  Research  Society  (2001) 

•  Michael  Tsai  received  the  Anna  Pogosyants  UROP  Award  (2000) 

•  Rui  Fan  (with  N.  Lynch)  received  the  Best  Student  Paper  award,  PODC  2004  and 
Best  Student  Paper  award,  PODC  2006 
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•  Seth  Gilbert  et  a!:  DISC  2003  paper  on  Geoquorums  invited  for  special  journal  issue 
(2003) 

•  Gregory  Chockier  received  the  following  honors  and  awards: 

o  Active  Disk  Paxos  paper  was  invited  to  a  special  issue  of  Distributed 
Computing  (2002) 

o  Best  paper  and  best  student  paper  in  Middleware  2000  (2000) 

•  Panayiotis  Mavrommatis  received  Honorable  Mention  from  the  Computing  Research 
Association,  for  his  work  on  implementing  a  code  generator  for  10A  specifications  of 
distributed  algorithms  (2005) 

•  Calvin  Newport  published  book  "How  to  Win  at  College:  Surprising  Secrets  for 
Success  from  the  Country's  Top  Students"  (2005) 
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Appendix  II:  Technology  Transitions 


•  Minghua  Chen  transitioned  his  E-MULTFRC  results  to  Caltech  group  headed  by 
Professor  Stephen  Low. 

•  Minghua  Chen's  E-MULTFRC  results  were  implemented  on  actual  Brew  phones  to 
demonstrate  effectiveness  on  CDMA  1XRTT  and  EVDO  networks.  Throughput 
increase  of  up  to  50%  was  observed. 

•  Web  based  Dropping/Blocking  Probability  Calculator  is  available  on  PI  Trivedi’s 
webpage:  www.ee.duke.edu/~kst 

•  PI  Trevedi’s  collaborations  with  Prof.  Dohi  of  Hiroshima  University,  Prof,  Bobbio  of 
Alexandria  (Italy),  Prof.  Jalote  of  1IT  Kanpur  was  at  various  levels:  co-authored 
several  papers  and  a  jointly  supervised  Ph.D.  Student  (Vibhu  Sharma). 

•  PI  Trevedi  collaborated  with  IBM  to  implement  software  rejuvenation  in  their  xSeries 
systems.  This  work  with  IBM  is  one  of  the  fastest  technology  transfers  our  research 
group  has  done.  IBM,  Sun  and  Motorola  have  supported  several  summer  internships 
in  this  area.  We  have  proposed  to  Motorola  to  implement  our  ideas  in  their  Cable 
Modem  Termination  System.  One  Ph.D.  dissertation  (Kalyan  Vaidyanathan's) 
supported  in  part  by  this  grant  has  been  completed  on  this  topic.  Sun  Microsystems  is 
continuing  to  take  this  research  forward.  One  patent  on  this  topic  jointly  with  Kenny 
Gross  of  Sun  Microsystems  has  been  granted. 

•  K.  Vaidyanathan  Graduated  in  Nov.  2002  and  joined  Sun  Microsystems. 

•  Dongyan  Chen  Graduated  in  March  2003  and  joined  Mitsubishi  Electronic  Research 
Lab. 

•  S.  Dharmaraja  first  moved  to  TRLabs,  Winnipeg,  Canada  and  then  joined  as  an 
Assistant  Professor,  Dept,  of  Mathematics,  I.I.T.  Delhi,  India. 

•  X.  Ma  joined  as  faculty,  the  Oral  Robertson  Univ. 

•  Y.  Cao  graduated  and  joined  OPNET  Technologies. 

•  W.  Xie  graduated  and  joined  AT  &  T. 

•  PI  Ammar’s  efforts  in  the  Message  Ferrying  work  ultimately  seeded  a  much  larger 
effort  in  the  area  of  Disruption  Tolerant  Networks.  This  work  is  now  funded  by 
contracts  with  DARPA  as  well  as  NSF, 

•  PI  Shin  contributed  our  results  via  an  industry  partner.  Philips  Research  USA,  to 
IEEE  802.1 1  Standards  Working  Groups. 

•  Transitions  by  MIT: 

Customer:  University  of  Connecticut 
Contact:  Dr.  Alex  Shvartsman 

Results:  Our  work  on  the  Rambo  algorithms  for  reconfigurable  atomic  memory  in 
dynamic  networks. 

Applications:  Continued  this  work  by  producing  several  engineering  improvements  on 
the  algorithm  and  implementing  the  results  in  a  LAN.  Work  involved  Peter  Musial, 
Vincent  Gramoli,  Chryssis  Georgiou,  and  others. 

Keyword:  Theory 

Customer:  Hewlett-Packard 

Contact:  Jeannie  R,  Albrecht,  Yasushi  Saito 
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Results:  Our  work  on  Ram  bo  algorithms  for  reconfigurable  atomic  memory  in  dynamic 
networks. 

Applications:  Continued  this  work  with  the  intention  of  using  it  in  HP  systems.  Wrote 
the  paper  "Ram bo  for  Dummies". 

Keyword:  Concept 

Customer:  1R1SA/INRIA 
Contact:  Dr.  Vincent  Gramoli 
Results:  Rambo  algorithms, 

Applications:  New  ideas  on  using  reconfigurable  quorum  systems  in  peer-to-peer 
systems.  For  example:  Peer-to-peer  Architecture  for  Self  Atomic  Memory;  SQUARE: 
Scalable  Quorum-Based  Atomic  Memory  with  Local  Reconfiguration  These  were 
outgrowths  of  Gramoli's  work  with  Alex  Shvartsman  and  ourselves  on  Rambo. 

Keyword:  Theory 

Customer:  SUNY  Buffalo 

Contact:  Dr.  Murat  Demirbas 

Results:  Our  collision  detector  models  and  results. 

Applications:  Generalized  and  implemented  MAC  layer  protocols  based  on  our  collision 
detector  results,  For  example:  ROBCAST:  A  Reliable  MAC  Layer  Protocol  for 
Broadcast  in  Wireless  Sensor  Networks;  A  MAC  Layer  Protocol  for  Priority-based 
Reliable  Multicast  in  Wireless  Ad  Hoc  Networks;  A  Transactional  Framework  for 
Programming  Wireless  Sensor/Actor  Networks.  These  were  direct  outgrowths  of  our 
work  with  Murat  on  wireless  networks. 

Keyword:  Concept,  Methodology 

Customer:  Systems  research  community 

Results:  Our  theoretical  treatment  of  Brewer's  Conjecture. 

Applications:  This  work  is  frequently  cited  in  the  systems  literature  in  the  context  of 
motivation  for  understanding  data  consistency  models  that  are  weaker  than  atomicity. 
Keyword:  Theory 

Customer:  Ben-Gurion  University 
Contact:  Dr.  Shlomi  Dolev 
Results:  Our  work  on  Geoquorums. 

Applications:  As  a  direct  follow-up  on  our  work  on  Geoquorums,  Dolev  and  co-workers 
designed  new,  improved  quorum  systems  for  use  with  a  geography -aware  mobile 
network,  For  example:  Geographic  Quorum  System  Approximations. 

Keyword:  Theory,  Concept 

Customer:  Texas  A&M  U. 

Contact:  Dr.  Jennifer  Welch 
Results:  Geoquorums. 

Applications:  As  a  follow-up  to  our  work  on  Geoquorums,  Chen  and  Welch  developed 
new  uses  of  network  density  to  facilitate  computation  in  mobile  networks.  For  example; 
Location-based  broadcasting  for  dense  mobile  ad  hoc  networks. 

Keyword:  Theory,  Concept 

Customer:  Lincoln  Laboratories 
Contact:  Dr.  Roger  Khazan 

Results:  Our  work  on  group  communication  system  design. 
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Applications:  Used  our  work  on  group  communication  system  design  to  develop  a 
"chat”  system  for  use  by  the  Air  Force. 

Keyword:  Concept,  Methodology 

Customer:  Lincoln  Laboratories 
Contact:  Dr.  Roger  Khazan 

Results:  Our  work  on  group  communication  and  on  analysis  of  security  protocols. 
Applications:  As  an  outgrowth  of  our  work  on  group  communication  and  on  analysis  of 
security  protocols,  Dr.  Khazan  worked  with  us  to  develop  practical  group  key 
management  schemes,  and  to  use  these  schemes  to  develop  new  secure  group 
communication  applications. 

Keyword:  Concept 

Customer:  ETH  Zurich 
Contact:  Dr.  Roger  Wattenhofer 

Results:  Our  lower  bound  result  for  gradient  clock  synchronization. 

Applications:  Meier  and  Thiele  extended  our  lower  bound  result  for  gradient  clock 
synchronization  to  obtain  a  better  bound,  for  a  more  restrictive  class  of  "oblivious" 
algorithms.  Locher  and  Wattenhofer  extended  our  results  by  obtaining  a  nontrivial  upper 
bound  for  gradient  clock  synchronization. 

Keyword:  Theory 

Customer:  Microsoft  Research  Asia 
Contact:  Dr.  Wei  Chen 

Results:  Our  brand-new  results  on  weakest  failure  detectors  (by  Guerraoui,  Kouznetsov, 
Lynch,  and  Newport). 

Applications:  Already  extending  our  results  to  different  classes  of  detectors  and  to  obtain 
sharper  results. 

Keyword:  Theory 

Customer:  VeroModo,  Inc. 

Contact:  Dr.  Alex  Shvartsman 
Results:  Our  basic  TlOA  modeling  work. 

Applications:  Transitioning  our  basic  TIOA  modeling  work  so  that  it  can  be  used  by 
practical  communication  system  and  hybrid  system  designers,  as  well  as  by  teachers  and 
distributed  systems  researchers. 

Keyword:  Theory,  Methodology 

Customer:  Naval  Research  Laboratory 
Contact:  Dr.  My  la  Archer 
Results:  Our  Tempo/TIO A  language. 

Applications:  Used  our  Tempo/TIOA  language  as  a  basis  for  developing  new  tools  for 
system  verification,  using  PVS  and  the  NRL-developed  TAME  interlace. 

Keyword:  Theory,  Concept,  Methodology,  Code 

Customer:  Cisco  Systems 
Contact:  Dr.  Ralph  Droms 

Results:  Our  TIOA  language  and  modeling  tools,  plus  our  techniques  for  decomposing 
and  analyzing  distributed  algorithms. 

Applications:  Joint  work  with  ourselves,  funded  by  Cisco,  applied  our  TIOA  language 
and  modeling  tools,  plus  our  techniques  for  decomposing  and  analyzing  distributed 
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algorithms,  to  analyze  the  correctness  and  performance  of  the  DHCP  communication 
protocol. 

Keyword:  Methodology 

Customer:  Lehman  College,  CUNY 

Contact:  Dr.  Nancy  Griffeth 

Results:  Our  Tempo/TIOA  language  and  tools. 

Applications:  Using  our  Tempo/TIOA  language  and  tools  in  teaching  courses  on 
network  protocols  and  distributed  algorithms.  Also  using  Tempo  in  research  in  network 
testing  and  automatic  protocol  generation. 

Keyword:  Theory,  Concept,  Methodology,  Code 

Customer:  NASA  Langley 
Contact:  Dr.  Cesar  Munoz 

Results:  We  modeled  and  analyzed  the  SATS  landing  protocol,  which  was  developed  by 
NASA. 

Applications:  We  are  working  this  summer  on  extending  the  analysis  to  additional 
NASA  flight  control  protocols. 

Keyword:  Methodology 

Customer:  Stony  Brook  U. 

Contact:  Dr.  Scott  Smolka 
Results:  Tempo/TIOA 

Applications:  Using  Tempo/TIOA  in  modeling  and  analyzing  biological  systems 
involving  heart  muscle  cells.  Also,  constructing  a  model-checker  using  TIOA. 

Keyword:  Methodology,  Code 

Customer:  Radboud  University,  Nijmegen 

Contact:  Dr.  Frits  Vaandrager 

Results:  TIOA  and  the  related  model  HIOA. 

Applications:  Has  used  TIOA  and  the  related  model  HIOA  as  the  basis  for  many  system 
modeling  and  analysis  case  study  projects.  Currently  using  TIOA  as  a  basis  for  a  new 
proposed  project  on  formal  methods  for  distributed  system  modeling  and  analysis. 
Keyword:  Theory,  Concept,  Methodology,  Code 

Customer:  American  University,  Beirut 
Contact:  Dr.  Paul  Attie 
Results:  Tempo/TIOA 

Applications:  Using  Tempo/TIOA  as  a  vehicle  for  developing  and  implementing  new 
abstraction  techniques  for  analyzing  safety  and  liveness  properties  of  distributed  systems, 
Keyword:  Theory,  Concept,  Methodology,  Code 

Customer:  Universite  catholique  de  Louvain,  Belgium 
Contact:  Dr.  Olivier  Pereira 

Results:  Our  security  protocol  modeling  framework  and  analysis  techniques. 
Applications:  Used  our  security  protocol  modeling  framework  and  analysis  techniques 
to  perform  computational  security  analysis  of  electronic  voting  protocols. 

Keyword:  Theory,  Concept,  Methodology 

Customer:  NTT  Communication  Science  Laboratories,  Japan. 

Contact:  Dr.  Tadashi  Arargi 
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Results:  Our  foundational  model  to  develop  automated  verification  techniques  for 
security  protocols. 

Applications:  Used  our  foundational  model  to  develop  automated  verification 
techniques  for  security  protocols. 

Keyword:  Theory,  Concept,  Methodology 

Customer:  Vanderbilt  University 

Contact:  James  Hill 

Results:  Our  Tempo/TIOA  tools. 

Applications:  Using  the  tools  in  developing  his  own  modeling  and  analysis  tools,  for 
large-scale  distributed  system  applications. 
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Appendix  III:  Patents  and  Inventions 

!.  Methods  and  systems  for  determining  an  optimal  training  interval  in  a 
communications  system,  US  7,092,437  B2,  filed  on  April  25,  2003;  granted  on 
August  15,  2006;  Kishor  Trivedi  with  Dongyan  Chen. 

2.  Methods  and  systems  for  improving  utilization  of  traffic  channels  in  a  mobile 
communications  network,  US  7,099,672,  filed  on  February  6,  2002;  granted  on 
August  29,  2006,  Kishor  Trivedi  with  Xiaomin  Ma  and  Yun  Liu. 

3.  Method  and  apparatus  for  using  pattern-recognition  to  trigger  software  rejuvenation, 
7,100,079,  filed  on  October  22,  2002;  granted  on  August  29,  2006;  Kishor  Trivedi 
with  Kenny  Gross. 

4.  Model-Based  Software  Design  and  Validation  by  Stephen  J.  Garland  and  Nancy  A. 
Lynch.  US  6,289,502  Bl,  September  1 1, 2001 

5.  Optimization  of  Streaming  Data  Throughput  in  Unreliable  Networks  by  Minghua 
Chen  and  Avideh  Zakhor. 
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Appendix  IV:  Personnel 


Principal  Investigators 

Avideh  Zakhor,  (UC  Berkeley) 
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